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Accurate approximations to density functionals have recently been obtained via machine learning 
(ML). By applying ML to a simple function of one variable without any random sampling, we extract 
the qualitative dependence of errors on hyperparameters. We find universal features of the behavior 
in extreme limits, including both very small and very large length scales, and the noise-free limit. 
We show how such features arise in ML models of density functionals. 


I. INTRODUCTION 

Machine learning (ML) is a powerful data-driven 
method for learning patterns in high-dimensional spaces 
via induction, and has had widespread success in many 
fields including more recent applications in quantum 
chemistry and materials science [1-9]. Here we are in¬ 
terested in applications of ML to construction of density 
functionals [10-14], which have focused so far on approx¬ 
imating the kinetic energy (KE) of non-interacting elec¬ 
trons. An accurate, general approximation to this could 
make orbital-free DFT a practical reality. 

However, ML methods have been developed within the 
areas of statistics and computer science, and have been 
applied to a huge variety of data, including neuroscience, 
image and text processing, and robotics [15]. Thus, they 
are quite general and have not been tailored to account 
for specific details of the quantum problem. For example, 
it was found that a standard method, kernel ridge regres¬ 
sion, could yield excellent results for the KE functional, 
while never yielding accurate functional derivatives. The 
development of methods for bypassing this difficulty has 
been important for ML in general [13]. 

ML provides a whole suite of tools for analyzing data, 
fitting highly non-linear functions, and dimensionality re¬ 
duction [16]. More importantly in this context, ML pro¬ 
vides a completely different way of thinking about elec¬ 
tronic structure. The traditional ab-initio approach [17] 
to electronic structure involves deriving carefully con¬ 
structed approximations to solving the Schrodinger equa¬ 
tion, based on physical intuition, exact conditions and 
asymptotic behaviors [18]. On the other hand, ML learns 
by example. Given a set of training data, ML algorithms 
learn via induction to predict new data. ML provides 
limited interpolation over a specific class of systems for 
which training data is available. 

A system of N interacting electrons with some exter¬ 
nal potential is characterized by a 3iV coordinate wave- 


function, which becomes computationally demanding for 
large N. In the mid 1960’s, Hohenberg and Kohn proved 
a one-to-one correspondence between the external poten¬ 
tial of a quantum system and its one-electron ground- 
state density [19], showing that all properties are func¬ 
tionals of the ground-state density alone, which can in 
principle be found from a single Euler equation for the 
density. Although these fundamental theorems of DFT 
proved the existence of a universal functional, essentially 
all modern calculations use the KS scheme [20], which 
is much more accurate, because the non-interacting KE 
is found exactly by using an orbital-scheme [21]. This 
is far faster than traditional approaches for large N, 
but remains a bottleneck. If a sufficiently accurate den¬ 
sity functional for the non-interacting electrons could 
be found, it could increase the size of computationally 
tractable systems by orders of magnitude. 

The Hohenberg-Kohn theorem guarantees that all 
properties of the system can be determined from the elec¬ 
tronic density alone. The basic tenet of ML is that a 
pattern must exist in the data in order for learning to 
be possible. Thus, DFT seems an ideal case to apply 
ML. ML learns the underlying pattern in solutions to 
the Schrodinger equation, bypassing the need to directly 
solve it. The HK theorem is a statement concerning the 
minimal information needed to do this for an arbitrary 
one-body potential. 

Some of us recently used ML to learn the non¬ 
interacting KE of fermions in a one-dimensional box 
subject to smooth external potentials [10] and of a 
one-dimensional model of diatomics where we demon¬ 
strated the ability of ML to break multiple bonds self- 
consistently via an orbital-free density functional theory 
(DFT) [11]. Such KE data is effectively noise-free, since 
it is generated via deterministic reference calculations, 
by solving the Schrodinger equation or KS equations nu¬ 
merically exactly. (The limited precision of the calcu¬ 
lation might be considered “noise,” as different imple- 
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FIG. 1. The error of the model, AT (Hartree), for the KE 
of particles in a box (Section IV) as a function of cr, for fixed 
A = Nt values for each curve are given in the legend. 

mentations might yield answers differing on the order of 
machine precision, but this is negligble.) There is no 
noise, in the traditional sense, as is typically associated 
with experimental data. Note that what is considered 
“noise” depends on what is considered ground truth, i.e., 
the data to be learned. In particular, if a single refer¬ 
ence method is used, its error with respect to a universal 
functional is not considered noise for the ML model. A 
perfect ML model should, at best, precisely reproduce 
the single-reference calculation. 

As an example, in Fig. 1 we plot a measure of the er¬ 
ror of ML for the KE of up to 4 noninteracting spinless 
fermions in a box under a potential with 9 parameters 
(given in detail in Ref. [10]), fitted for different num¬ 
bers of evenly spaced training densities as a function of 
the hyperparameter a (called the length scale), for fixed 
A (a hyperparameter called the regularization strength) 
and several different number of training points Nt- The 
scale is logarithmic [22], so there are large variations in 
the fitted error. We will give a more in-depth analysis of 
the model performance on this data set in a later section 
after we have formally defined the functions and hyperpa¬ 
rameters involved, but for now it is still useful to observe 
the qualitative behaviors that emerge in the figure. Note 
that the curves assume roughly the same shape for each 
Nt over the range of a, and that they all possess distinct 
features in different regimes of a. 

To better understand the behavior with respect to hy¬ 
perparameters seen in Fig 1, we have chosen in this pa¬ 
per to apply them to the prototypical regression problem, 
that of fitting a simple function of one coordinate. We 
also remove all stochastic elements of the procedure, by 
considering data points on uniform grids, defining errors 
in the continuum limit, etc. This is shown in Fig. 2, 
where we plot a measure of the error of ML for a simple 
function cos x, fitted in the region between 0 and 1, inclu¬ 
sive, for several Nt (represented as values on a grid) as 
a function of a. Note the remarkable similarity between 
the features and characteristics of the curves of this fig¬ 
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FIG. 2. Dependence of the model error (as a function of a) 
when fitting cosx between 0 and 1 (Section III) for various 
Nt (shown in the legend) for A = 10~®. 


ure and those of Fig. I (like Fig. 1 before it, we will give 
a more in-depth analysis of Fig. 2 later). We explore the 
behavior of the fitting error as a function of the number 
of training parameters and the hyperparameters that are 
used in kernel ridge regression with Gaussian kernel. We 
find the landscape to be surprisingly rich, and we also 
find elegant simplicities in various limiting cases. After 
this, we will be able to characterize the behavior of ML 
for systems like the one shown in Fig. 1. 


Looking at Fig. 2, we see that the best results (low¬ 
est error) are always obtained from the middle of the 
curves, which can become quite flat with enough training 
data. Thus, any method for determining hyperparame¬ 
ters should usually yield a length scale somewhere in this 
valley. For very small length scales, all curves converge to 
the same poor result, regardless of the number of train¬ 
ing points. On the other hand, notice also the plateau 
structure that develops for very large length scales, again 
with all curves converging to a certain limit. We show for 
which ranges of hyperparameters these plateaus emerge 
and how they can be estimated. We also study and ex¬ 
plain many of the features of these curves. To show the 
value of this study, we then apply the same reasoning to 
the problem that was tackled in Ref. [10, 12], which we 
showcased in Fig. 1. From the machine learning perspec¬ 
tive our study may appear unusual as it considers proper¬ 
ties in data and problems that are uncommon. Namely 
there are only a few noise free data points and all are 
low dimensional. Nevertheless, from the physics point of 
view the toy data considered reflects very well the essen¬ 
tial properties of a highly relevant problem in quantum 
physics: the machine learning of DFT. 
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II. BACKGROUND 

In this work, we will first use ML to fit a very simple 
function of one variable, 

f{x)=cosx, ( 1 ) 

on the interval x G [0,1]. We will focus on exploring the 
properties of ML for this simple function before proceed¬ 
ing to our DFT cases. We choose a set of x-values and 
corresponding f{x) values as the “training data” for ML 
to learn from. In ML, the cc-values {xj} for j = I,..., Nt 
are known as features, and corresponding /-values, {fj}, 
are known as labels. Here is the number of training 
samples. Usually, ML is applied with considerable ran¬ 
dom elements, such as in the choice of data points and se¬ 
lection of test data. In our case, we choose evenly spaced 
training points on the interval [0,1]: Xj = {j—l)/{NT — l) 
for / = I,..., Nt. 

Using this dataset, we apply kernel ridge regression 
(KRR), which is a non-linear form of regression with reg¬ 
ularization to prevent overfitting [16], to fit f{x). The 
general form of KRR is 

Nt 

f^^{x) ='^ajk{x,Xj), (2) 

i=i 

where aj are the weights, k is the kernel (which is a 
measure of similarity between features), and the hyper¬ 
parameter A controls the strength of the regularization 
and is linked to the noise level of the learning problem. 
We use the Gaussian kernel 

fc(a;, a;') = exp (—(a: — a;')^/2cr^) , (3) 

a standard choice in ML that works well for a variety 
of problems. The hyperparameter cr is the length scale 
of the Gaussian, which controls the degree of correlation 
between training points. 

The weights aj are obtained through the minimization 
of the cost function 

Nt 

C{a)='^{f^^{xj)-fj)^+ Xa^Kcx, (4) 

i=i 

where 

a = {ai,.. .,aNT)^ ■ ( 5 ) 

The exact solution is given by 

OL = {K + \I)-^f, (6) 

where I is the Nt x Nt identity matrix, K is the ker¬ 
nel matrix with elements Kij = k{xi,Xj), and / = 
(/i, • • ■, /NtV- 

The two parameters A and a not determined by Eq. (6) 
are called hyperparameters and must be determined from 


the data (see section IIIG). cr can be viewed as the char¬ 
acteristic length scale of the problem being learned (the 
scale on which changes of / take place), as discernible 
from the data (and thus dependent on, e.g., the number 
of training samples). A controls the leeway the model has 
to fit the training points. For small A, the model has to 
fit the training points exactly, whereas for larger A some 
deviation is allowed. Larger values of A therefore cause 
the model to be smoother and vary less, i.e., less prone 
to overfitting. This can be directly seen in Gaussian pro¬ 
cess regression [23], a related Bayesian ML model with 
predictions identical to those of KRR. There, A formally 
is the variance of the assumed additive Gaussian noise in 
values of /. 

KRR is a method of interpolation. Here, we are mainly 
concerned with the error of the machine learning approx¬ 
imation (MLA) to f{x) in the interpolation region, which 
in this case is the interval x G [0,1]. As a measure of this 
error, we define 

Af=['dx{fix)-N^{xN. (7) 

Jo 

In the case of the Gaussian kernel, we can expand this 
and derive the integrals that appear analytically. To sim¬ 
plify the analytical process, we define 

Afo= [ dxf{x), (8) 

Jo 

as the benchmark error when f^^{x) = 0. For the cosine 
function in Eq. (1), 

Afo = [ dx cos^(x) = ^ + « 0.7273. (9) 

Jo 2 4 

Our goal is to characterize the dependence of the per¬ 
formance of the model on the size of the training data set 
(Nt) and the hyperparameters of the model (a. A). For 
this simple model, we discuss different regions of qual¬ 
itative behavior and derive the dependence of Af for 
various limiting values of these hyperparameters; we do 
all of this in the next few sections. In Section IV, we 
discuss how these results can be qualitatively generalized 
for the problem of using ML to learn the KE functional 
for non-interacting fermions in the box for a limited class 
of potentials. 

III. ANALYSIS 

We begin by analyzing the structure of Af as a func¬ 
tion of cr for fixed A and Nt. Fig. 2 shows Af as a 
function of tr for various Nt while fixing A = I0“®. 
The curves have roughly the same “valley” shape for all 
Nt. The bottom of the valley is an order of magnitude 
deeper than the walls and may have multiple local min¬ 
ima. These valleys are nearly identical in shape for suf¬ 
ficiently large Nt, which indicates that this particular 
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FIG. 3. Dependence of the model error (as a function of a) 
for various A with Nt ~ 33. The labels give the value of A 
for each curve. 


feature arises in a systematic manner as increases. 
Moreover, this central valley opens up to the left (i.e., 
smaller tr) as Nt increases— as the training samples be¬ 
come more densely packed, narrower Gaussians are bet¬ 
ter able to interpolate the function smoothly. Thus, with 
more training samples, a wider range of a values will 
yield an accurate model. 

In addition, a “cusp” will begin to appear in the region 
to the left of the valley, and its general shape remains the 
same for increasing Nt- This is another recurring feature 
that appears to develop systematically like the valley. For 
a fixed Nt, and starting from the far left, the Af curve 
begins to decrease monotonically to the right, i.e., as a 
increases. The cusps mark the first break in this mono¬ 
tonic behavior, as Af increases briefly after reaching this 
local minimum before resuming its monotonic decrease 
for increasing a (until this monotonicity is interrupted 
again in the valley region). The cusps shift to the left 
as Nt increases, following the trend of the valleys. This 
indicates that they are a fundamental feature of the Af 
curves and that their appearances coincide with a par¬ 
ticular behavior of the model as it approaches certain a 
values. Note that Af decreases nearly monotonically as 
Nt increases for all a. This is as expected, since each 
additional training sample adds another weighted Gaus¬ 
sian, which should improve the fit. 

Fig. 3 again shows Af as a function of a, but for vari¬ 
ous A with Nt ftxed at 33. As A decreases, Af again de¬ 
creases nearly monotonically and the central region opens 
up to the right (i.e., larger a). Note that the curves 
for each A coincide up to a certain a before they split 
off from the rest, with the lower A-valued curves break¬ 
ing off further along to the right than those with larger 
A. This shows a well-known phenomenon, namely that 
regularization strength A and kernel length scale a both 
give rise to regularization and smoothing [24] . Addition¬ 
ally, we observe the emergence of “plateau”-like struc¬ 
tures on the right. These will be explored in detail in 
Section III A 2. 
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FIG. 4. Model error Af as a function of a for Nt = 20 and 
A = 10“®. We divide the range of a into three qualitatively 
distinct regions I, II and III. The boundaries between the 
regions are given by the vertical dashed lines. 


A. Regions of qualitative behavior 

In Fig. 4, we plot Af as a function of a for fixed A 
and Nt- The three regions labeled I, II, and III denote 
areas of distinct qualitative behavior. They are delin¬ 
eated by two arbitrary boundaries we denote by (7^ (s for 
small, between I and II) and cr/ {I for large, between II 
and III). In region I, Af decreases significantly as a in¬ 
creases. The region ends when there is signihcant overlap 
between neighboring Gaussians (i.e., when k{xj,Xj+i) is 
no longer small). Region II is a “valley” where the global 
minimum for Af resides. Region III begins where the 
valley ends and is populated by “plateaus” that corre¬ 
spond to f^^{x) assuming a polynomial form (see Sec¬ 
tion III A 2). In the following sections, we examine each 
region separately. In particular, we are interested in the 
asymptotic behavior of Af with respect to Nt, cr and A. 


1. Length scale too small 

The ML model for f{x), given in Eq. (2), is a sum 
of weighted Gaussians centered at the training points, 
where the weights aj are chosen to best reproduce the 
unknown f{x). Fig. 5 shows what happens when the 
width of the Gaussian kernel is too small—the model is 
incapable of learning f{x)- We call this the “comb” re¬ 
gion, as the shape of f^^{x) arising from the narrow 
Gaussians resembles a comb. In order for f^^{x) to ac¬ 
curately fit f{x), the weighted Gaussians must have sig¬ 
nificant overlap. This begins when cr is on the order of 
the distance between adjacent training points. A corre¬ 
sponding general heuristic is to use a multiple (e.g., four 
times) of the median nearest neighbor distance over the 
training set [11]. For equally spaced training data in one 
dimension, this is Ax « 1/A^Ti so we define 


0's = I/Nt 


( 10 ) 
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FIG. 5. Comparison of the function f{x) (black dot-dashed) 
and the ML model f^^{x) (red dashed), for Nt = 5, cr = 
0.05 ^ (Js (us = 0.2), and A = 10“®. When summed, the 
weighted Gaussians, ajk{x,Xj) (blue solid), give f^^{x). The 
blue dots show the location of the training points and the 
value of the corresponding weights. In this case, the model is 
in the “comb” region, when a aa. The width of the Gaus¬ 
sians is much smaller than the distance Ax between adjacent 
training points, and so the model cannot reproduce the exact 
function. 
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FIG. 7. Same as Fig. 4, but comparing Af against our expan¬ 
sion of Af for a as (blue dashed), given in Eq. (Al). The 
vertical dashed line shows the boundary as between regions 
I and II. The expansion breaks down before we reach as, as 
the approximation that aj « f(xj) is no longer valid. 



X 

FIG. 8. Same as Fig. 5, but for a = 0.1, and A = 5. In this 
case, known as over-regularization, A is too large, forcing the 
magnitude of the weights aj to be small and preventing the 
model from fitting f{x). 


FIG. 6. Same as Fig. 5, but for a = Oa = 0.2. Here, the 
model is in region II, the optimum region for the model. The 
error in the model is very small for all x in the interpolation 
region. The width of the Gaussians is comparable to the size 
of the interpolation region, and the weights are large. 

to be the boundary between regions I and II. In Fig. 6, as 
the overlap between neighboring Gaussians becomes sig¬ 
nificant the model is able to reproduce the model well but 
still with significant error. Note that the common heuris¬ 
tics of choosing the length scale in radial basis function 
networks [25] are very much in line with this finding. In 
the comb region, Af decreases as a increases in a char¬ 
acteristic way as the Gaussians begin to fill up the space 
between the training points. For A —> 0, the weights are 
approximately given as the values of the function at the 
corresponding training points: 


Thus, for small cr, the weights are independent of a. Let 
be the error of the model when aj = f{xj). 
This approximation, shown in Fig. 7, captures the initial 
decrease of Af as a increases, but breaks down before 
we reach a a- The qualitative nature of this decay is in¬ 
dependent of the type of function f{x), but its location 
and scale will depend on the specifics. 

As cr 0 (for fixed A and A^t), f^^{x) becomes the 
sum of infinitesimally narrow Gaussians. Thus, in this 
limit, the error in the model becomes 

lim Af = Afo. (12) 

(T—rO 

Note that this limit is independent of A and Nt- 

Fig. 8 shows what happens when the regularization 
becomes too strong. (Although shown for cr in region 
I, the qualitative behavior is the same for any cr.) The 
regularization term in Eq. (4) forces the magnitude of 
the weights to be small, preventing f^^{x) from fitting 


fj, a<.aa. 
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f{x). As A —>■ oo, the weights are driven to zero, and so 
we obtain the same limit as in Eq. (12): 

lim Af = Afo. (13) 

A— 


2. Length scale too large 


We define the boundary ai between regions 11 and Ill 
as the last local minimum of Af (with respect to a). 
Thus, in region III (see Figs. 2 and 3) Af is monotoni- 
cally increasing. As a becomes large, the kernel functions 
become wide and fiat over the interpolation region, and 
in the limit cr —>■ oo, f^^{x) is approximately a constant 
over X G [0,1]. For small A, the optimal constant will be 
the average value over the training data 


I 


(T,1/A—>-oo 


(14) 


1=1 


Note that the order of limits is important here: first a —i 
00 , then A —i 0. If the order is reversed, f^^(x) becomes 
the best polynomial fit of order Atp. We will show this 
explicitly for = 2. For smaller a in region III, as 
A decreases, the emergence of “plateau”-like structures 
can be seen (see Fig. 3). As will be shown, these fiat 
areas correspond to the model behaving as polynomial 
fits of different orders. These can be derived by taking 
the limits cr —1 oo and A —1 0 while maintaining cr in a 
certain proportion to A. 

a. Nt = 2 : In this case, the ML function is 


/“L(x) = ai + 02 


(15) 


and the weights are determined by solving 



1 + A 

e-i/ 2 -= 1 + A 



where fj = f{xj). The solution is 

ai = (/i(l + A)-e-i/2"V2)M 
a2 = (/2(l + A)-e-i/2-Vi)M 


(16) 


(17) 

(18) 


where D = det{K + XI) = 1 + 2\ + , pirst, 

we expand in powers of cr as tr —I oo, keeping up to first 
order in l/ 2 cr^: 



ai « ((/i - /2) + fiX + 

(19) 


Oi2 ~ ((/2 - /l) + / 2 A -f fil2(T'^)lD, 

(20) 

where 


D « 2A-b A^ -b 1/20-2. 

(21) 

Finally 


f^^ix) « a -b ( 0 : 2 ( 20 ; — 1) — ax^)/2a'^, 

(22) 


where d = oi -I- 02 . Next, we take A —1 0. In this limit 
D vanishes and the weights diverge. The relative rate 
at which the limits are taken will affect the asymptotic 
behavior of the weights. The form of D suggests we take 

(23) 

where /I is a constant. 

Taking cr —> 00 , we obtain a linear form: 


/r(^) 


Pfl+ f + P{f 2 - fl)x 

P 1 


(24) 


where f = ^{fi + f 2 )- The parameter /3 smoothly con¬ 
nects the constant and linear plateaus. When /? —>■ 0, we 
recover the constant form f^^{x) = f; when /3 —>■ 00 , we 
recover the linear form f^^{x) = fi + x{f 2 — fi)- 

We can determine the shape of the transition between 
plateaus by substituting Eq. (24) for f^^{x) into Eq. (7) 
for Af. For simplicity’s sake, we first define 

hij = f dx x^f^x), (25) 

Jo 


since expressions of this form will show up in subsequent 
derivations in this work. Finally, we obtain 


^ j; —2(/-I-/i/3)/ioi 2/?(/i —/2)/iii 

- 


(3 + 6 ^ + 4 / 32 )/' - / 1 / 2/32 


3(l + /32) 


^ 02 - 


(26) 


In Fig. 9, we compare our numerical Af with the ex¬ 
pansion Eq. (26) showing the transition between the lin¬ 
ear and constant plateaus. In the case of = 2, only 
these two plateaus exist. In general, there will be at most 
Nt plateaus, each corresponding to successively higher 
order polynomial fits. However, not all of these plateaus 
will necessarily emerge for a given Nt', as we will show, 
the plateaus only become apparent when A is sufficiently 
small, i.e., when the asymptotic behavior is reached, and 
when cr and A are proportional in a certain way similar to 
how we defined /3. This analysis reveals the origin of the 
plateaus. In the series expansion for cr —>■ 00 , A —>■ 0, cer¬ 
tain terms corresponding to polynomial forms becomes 
dominant when a and A remain proportional. 

b. Nt = 3 : We proceed in the same manner for 
Nt = 3, using /3 and substituting into the analytical 
form of f^^{x) for this case to obtain an expression for 
Afis (shown in the appendix). This expression is plotted 
in Fig. 10. 

To derive the limiting value of Zi/ at each plateau for 
large Nt and small A, we minimize the cost function 
Eq. (4) (which is equivalent to Eq. (7) in this limit), 
assuming an n-th order polynomial form for f^^{x). We 
define c„ as the limiting value of Af for the n-th order 
plateau: 


Cn — 


lim = min 

crocA" >-oo ^ 


dx{f{x) - y^^ujjX^)'^. 


i=0 


( 27 ) 
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For our case with f{x) = cos(a;), cq = 0.0193. 

For the linear plateau, f^^{x) assumes the linear form 
ax + 5; minimizing Eq. (7) with respect to a and 6, we 
find that 

f^^ix) = {Uhii - 6/ioi)a: + 4/101 - 6/in, (32) 
yielding, via Eq. (27), 

Cl = ho2 — 4 (/igi ~ 3 hoihii + 3 /in) • (33) 

Eor our case with f{x) = cos(a:), Ci = 1 x 10“^. The 
same procedure yields C 2 = 2.25 x 10“®. 

Next, we dehne 


FIG. 9. Comparing the numerical Af (black solid), for Nt = 
2 and A = 10“®, with our asymptotic form Afp (red dashed) 
given by Eq. (26). The asymptotic form accurately recovers 
the behavior of Af in the plateau regions, but fails for small 
a as expected. 
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FIG. 10. Comparing the numerical Af (black solid), for Nt = 
3 and A = 10“®, with our asymptotic form Afp (red dashed). 
The asymptotic form accurately recovers the behavior of Af 
in the plateau regions, but fails for small a as expected. 


For the constant plateau, f^^{x) assumes the constant 
form a; to minimize Eq. (7) with respect to a, we solve 


da 


dx {f{x) 


a)2 = 0 


for a, obtaining 


a = 


dxf{x), 


so that 


= /loi 


Thus, we obtain 


Co = -h, 


102 - 


(28) 


(29) 


(30) 


(31) 


1 

2A1/2ct2 


(34) 


as another parameter to relate a and A. We choose to de¬ 
fine this using the same motivation as for /3, i.e., we exam¬ 
ined our analytical expression for f^^{x) and picked this 
parameter to substitute in order for a and A to remain 
proportional in a specific way as they approach certain 
limits and to see what values Af takes for these limits 
(in particular, we are interested to see if we can obtain 
all 3 plateaus for Nt = 3). In doing this, we obtain an 
expansion analogous to that of Eq. (26) (shown as Af^ 
in the appendix). 

We plot this expression in Fig. 11, alongside our numer¬ 
ical Af and the plateau limits, for Nt = 3 and varying 
A. Note that the curves of the expansions are contingent 
on the value of A; we do not retrieve all 3 plateaus for 
all of the expansions. Only the expansion curves corre¬ 
sponding to the smallest A (10“^°, the blue curve) and 
second smallest A (10“®, the yellow curve) show broad, 
definitive ranges of a where they take the value of each 
of the 3 plateaus (for the dashed blue curve, this is ev¬ 
ident for Cl and C 2 ; the curve approaches cq for larger 
a ranges not shown in the figure), suggesting a specific 
proportion between cr and A is needed for this to occur. 
For the solid numerical curves, only the blue curve mani¬ 
fests all 3 plateaus (like its expansion curve counterpart, 
it approaches cq for larger a ranges not shown); the other 
two do not obtain all 3 plateaus, regardless of the range 
of a (the solid red curve does not even go down as far as 
C 2 ). However, there appears to be a singularity for each 
of the expansion curves (the sharp spikes for the dashed 
curves) at certain values of cr (e) depending on A. This 
singularity emerges because our substitution of e leads to 
an expression with e in the denominator of our Af ana¬ 
lytical form, which naturally has a singularity for certain 
values of e depending on A. Following the precedent set 
for Nt = 2 and Nt = 3, we can proceed in the same 
way for larger Nt and perform the same analysis, where 
we expect to find higher order plateaus and the same 
behavior for limiting values of the parameters, including 
specific plateau values for Af when a and A are varied 
with respect to each other in certain ways analogous to 
that of the previous cases. We would like to remark that 
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FIG. 11. The dependence of the model error on a, for Nt = 3 
and varying A. The solid curves are numerical; the dashed 
curves are expansions derived by using our expression for e 
in Eq. (34) and substituting into f^^(x) in Eq. (7). The 
legend gives the colors (for both the dashed and solid curves) 
corresponding to each A. 



FIG. 12. The dependence of Af{a) on A for various Nt- 
Here, a minimizes Af for fixed Nt and A. Nt values for each 
curve are given in the legend. The dashed line shows a linear 
proportionality between Af{a) and A. 

plateau-like behaviors are well-known in statistical (on¬ 
line) learning in neural networks [26]. However, those 
plateaus are distinct from the plateau effects discussed 
here since they correspond to limits in the (online) learn¬ 
ing behavior due to symmetries [27, 28] or singularities 
[29, 30] in the model. 


3. Length scale just right 

In the central region (see Fig. 4), Af as a function of 
a has the shape of a valley. The optimum model, i.e., the 
model which gives the lowest error Af, is found in this 
region. For fixed Nt and A, we define the a that gives 
the global minimum of Af as a. In Fig. 12, we plot the 
behavior of Af{a) as a function of A. Again, we observe 
three regions of different qualitative behavior. For large 


FIG. 13. Same as Fig. 2, except with a replaced by yu, where 
7 = 10 for the solid curves and 7=1 for the dashed curves. 
The labels give the value of Nt for each curve. 

A, we over-regularize (as was shown in Fig. 8 ), giving 
the limiting value Afo in Eq. (13). For moderate A, we 
observe an approximately linear proportionality between 
Af{d) and A: 

Af{a) oc A. (35) 

However, for small enough A, there is vanishing regular¬ 
ization 

a^F=\im{K + XI)-^f, (36) 

A—J-O 

yielding the noise-free limit of the model: 

Nt 

fNF{x) = '^aNFjk{x,Xj). (37) 

In this case (for the Gaussian kernel), this limit exists for 
all a. The error of the noise-free model is 

4\/nf = lim Af. (38) 


B. Dependence on function scale 

We now introduce the parameter 7 into our simple one 
variable function, so that Eq. (1) becomes 

f{x) = cos{'jx). (39) 

For large values of 7 , Eq. (39) becomes highly oscilla¬ 
tory; we extend our analysis here in order to observe the 
behavior of the model in this case. 

Fig. 13 shows Af as a function of ■ja for various Nt 
while fixing A = 10“® and 7=10 (solid lines), 7 = 1 
(dashed lines). This is the same as that of Fig. 2, except 
with the additional 7 parameter. This figure demon¬ 
strates that the qualitative behaviors we observed in 
Fig. 2 persist with the inclusion of the 7 parameter, com¬ 
plete with the characteristic “valley” shape emerging in 
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FIG. 14. Same as Fig. 3, except with a replaced by ya, where 
7 = 10 for the solid curves and 7 = 1 for the dashed curves. 
The labels give the value of A for each curve. 

the moderate a region for each Nt- Similarly, we see that 
Aj decreases nearly monotonically for increasing for 
all 7 cr, while opening up to the left as the Gaussians 
are better able to interpolate the function. The cusps, 
though not as pronounced, are still present to the left of 
the valleys, and their general shapes remain the same for 
increasing 

Fig. 14 shows Af as a function of ycr for various A 
while fixing Nt = 33 and 7=10 (solid lines), 7 = 1 
(dashed lines). This is the same as Fig. 3, except with 
the 7 parameter included. Like in Fig. 3, as A decreases 
Af decreases nearly monotonically. The same qualita¬ 
tive features still hold, including the splitting-off of each 
lower-valued A curve further along cr. 

Next, we look at how the optimal model depends on 
Nt- In Fig. 15, we plot Af{a) as a function of Nt, 
for various 7 . For small Nt, there is little to no im¬ 
provement in the model, depending on 7 . For large 7 , 
f^^{x) is rapidly varying and the model requires more 
training samples before it can begin to accurately fit the 
function. At this point, Af{a) decreases as where 
c « 27 is a constant independent of 7 . This fast learn¬ 
ing rate drops off considerably when Af is on the order 
of A (i.e., at the limit of machine precision), and Af{a) 
levels off (as A corresponds to the leeway the model has 
for fitting training f{x) values, i.e., to the accuracy with 
which the model can resolve errors during fitting, it can¬ 
not improve the error much beyond this value). In fact, 
it is known that the learning rate in the asymptotic limit 
is 1/Nt for faithful models (i.e., models that capture the 
structure of the data) and l/^/Nr for unfaithful models 
[16, 31]. However, before the regularization kicks in Af 
is approximately the noise-free limit Z\/np. If the noise- 
free limit were taken for all Nt, it appears that Af{a) 
would decrease continually at the same learning rate: 

A\/nf oc (40) 

The learning rate here resembles the error decay of an 
integration rule, as our simple function is smooth and 



FIG. 15. The dependence of Af(a) on Nt for various 7 . 
Here, a minimizes Af for fixed Nt and A. The solid portion 
of the line represents the limit at A —>■ 0 (the noise-free curve), 
while the dot-dashed continuation shows the decay for hnite A 
(A = is shown here). For large enough Nt and A —>■ 0, 

Af has the asymptotic form given approximately by the linear 
fit here (dashed line). Note that, although this asymptotic 
form is independent of 7 , for larger 7 the asymptotic region 
is reached at larger Nt- 

can always be approximated locally by a Taylor series 
expansion with enough points on the interval. However, 
the model here uses an expansion of Gaussian functions 
instead of polynomials of a particular order, and the error 
decays much faster than a standard integration rule such 
as Simpson’s, which decays as Nf^'^ in the asymptotic 
limit. Additionally, Eq. (40) is independent of 7 since, for 
large enough Nt, the functions appear no more complex 
locally. The larger y-intercepts for the larger 7 curves in 
Fig. 15 arise due to the larger number of points needed 
to reach this asymptotic regime, so the errors should be 
comparatively larger. 

C. Cross-validation 

In previous works (Ref. [10, 12]) applying ML to DFT, 
the hyperparameters of the model were optimized in or¬ 
der to find the best one, i.e., we needed to find the hy¬ 
perparameters such that the error for the model is min¬ 
imal on the entire test set, which has not been seen 
by the machine in training [ 8 ]. We did this by using 
cross-validation, a technique whereby we minimize the 
error of the model with respect to the hyperparameters 
on a partitioned subset of the data we denote as the 
validation set. Only after we have chosen the optimal 
hyperparameters through cross-validation do we test the 
accuracy of our model by determining the error on the 
test set. We focus our attention on leave-one-out cross- 
validation, where the training set is randomly partitioned 
into Nt bins of size one (each bin consisting of a distinct 
training sample). A validation set is formed by taking 
the first of these bins, while a training set is formed from 
the rest. The model is trained on the training set, and 
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FIG. 16. The dependence of Af on a, for A = 10“® and 
Nt = 9 (curve shown in red with dashed lines) and 33 (curve 
shown in blue with solid lines). The crosses denote Af for 
the optimized cr values found from performing leave-one-out 
cross-validation (some of these are degenerate, so there are 
less than Nt distinct crosses shown), while the dots denote 
Af{a), the global minimum of Af over a (the crosses and 
dots are matched in color with the curves for each Nt). 


optimal hyperparameters are determined by minimizing 
the error on the singleton validation set. This procedure 
is repeated for each bin, so Nt pairs of optimal hyper¬ 
parameters are obtained in this manner; we take as our 
final optimal hyperparameters the median of each hyper¬ 
parameter on the entire set of obtained hyperparameters. 
The generalization error of the model with optimal hy¬ 
perparameters will finally be tested on a test set, which 
is inaccessible to the machine in cross-validation. 

Our previous works [10-12] demonstrated the efficacy 
of cross-validation in producing an optimal model. Our 
aim here is to show how this procedure optimizes the 
model for our simple function on evenly-spaced training 
samples. We have thus far trained our model on evenly 
spaced points on the interval [0,1]: Xj = (j — 1)/ {Nt — 1) 
for j = 1,..., Nt- We want to compare how the model 
error determined in this way compares to the model er¬ 
rors using leave-one-out cross-validation to obtain opti¬ 
mal hyperparameters. In Fig. 16, we plot the model er¬ 
ror over a range of a values (we fix A = 10“® and we 
use Nt = 9 and Nt = 33; compare this with Fig. 2). 
For each Nt, we perform leave-one-out cross-validation 
(using our fixed A so that we obtain optimal a), yield¬ 
ing Nt optimal cr values; we plot the model errors for 
each of these a. We also include the global minimum 
error Af{a) for each Nt to show how they compare to 
the errors for the optimal a. Looking at Fig. 16, we see 
that the optimal a values all yield errors near Af{a) and 
within the characteristic “valley” region, demonstrating 
that leave-one-out cross-validation indeed optimizes our 
model. With this close proximity in error values estab¬ 
lished, we can thus reasonably estimate the error of the 
model for the optimal cr (for a given A) by using a. 


A canonical quantum system used frequently to ex¬ 
plore basic quantum principles and as a proving ground 
for approximate quantum methods is the particle in a 
box. In this case, we confine one fermion to a Id box 
with hard walls at a; = 0,1, with the addition of the ex¬ 
ternal potential v{x) in the interval x G [0,1]. The equa¬ 
tion that governs the quantum mechanics is the familiar 
one-body Schrodinger equation 

+vix)^ (fix) = ecfix). (41) 

A solution of this equation gives the orbitals i’ji'^) 
energies €j . For one fermion, only the lowest energy level 
is occupied. The total energy is A = ei, the potential 
energy is 


V = 


dx n{x)v{x) 


(42) 


(where n{x) = \(j){x)\^ is the electron density), and the 
KE is T = E — V. In the case of one particle, the KE 
can be expressed exactly in terms of the electron density, 
known as the von Weizsacker functional [32] 


rj^W 



(43) 


where n'{x) = dn/dx. Our goal here in this section is 
not to demonstrate the efficacy of ML approximations 
for the KE in DFT (which is the subject of other works 
[10, 11]), but rather to study the properties of the ML 
approximations with respect to those applications. 

We choose a simple potential inside the box, 

v{x) = —Usin^ -Kx, (44) 


to model a well of depth D, which has also been used 
in the study of semiclassical methods [33]. To generate 
reference data for ML to learn from, we solve Eq. (41) 
numerically by discretizing space onto a uniform grid, 
Xj = {j - 1 )/{Ng - 1), for j = 1,..., Ng, where Ng is 
the number of grid points. Numerov’s method is used to 
solve for the lowest energy orbital and its corresponding 
eigenvalue. We compute T and n{x), which is represented 
by its values on the grid. For a desired number of training 
samples Nt, we vary D uniformly over the range [0,100], 
inclusive, generating Nt pairs of electron densities and 
exact KEs. Additionally, a test set with 500 pairs of 
electron densities and exact KEs is generated. 

As in the previous sections, we use KRR to learn the 
KE of this model system. The formulation is identical to 
that of Ref. [10]: 


N'T 


T“^[n] = 


(45) 
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FIG. 17. The error of the model, AT (Hartree), as a function 
of (T, for fixed A = 10“®. Nt values for each curve are given 
in the legend. 

where k is the Gaussian kernel 

k[n,n'] = exp(-||n - n'||V(2cr^)), (46) 

and 

Ng 

\\n - n'f = Ax'^{n{xj) - n'(xj)f, (47) 

i=i 

where Ax = l/(iVG — 1) is the grid spacing. The weights 
are again given by Eq. (6), found by minimizing the cost 
function in Eq. (4). 

In analogy to Eq. (7), we measure the error of the 
model as the total squared error integrated over the in¬ 
terpolation region 

^100 

AT= dD{T^^[nD]-T[nD]f, (48) 

Jo 

where nu is the exact density for the potential with well 
depth D, and T[nD] is the exact corresponding KE. We 
approximate the integral by Simpson’s rule evaluated on 
D sampled over the test set (i.e., 500 values spaced uni¬ 
formly over the interpolation region). This sampling is 
sufficiently dense over the interval to give an accurate 
approximation to AT. 

In Fig. 17, we plot AT as a function of the length 
scale of the Gaussian kernel, cr, for various training set 
sizes Nt- Clearly, the trends are very similar to Fig. 2: 
the transition CTs between the regions I and II becomes 
smaller as Nt increases, the valley in region II widens, 
and region III on the right remains largely unchanged. 
The dependence of CTs on Nt appears to follow the same 
power law as oc N^, but the value of p is different in 
this case. A rough estimate yields p « —0.8, which is 
similar to p = — 1 for the simple cosine function explored 
in the previous sections, but the details will depend on 
the specifics of the data. 

Similarly, Fig. 18 shows the same plot but with Nt 
fixed and A varied. Again, the same features are present 



log (7 

FIG. 18. The error of the model, AT (Hartree), as a function 
of cr, for various A with Nt = 33. The labels give the value 
of A for each curve. 


as in Fig. 3, i.e., three regions with different qualitative 
behaviors. In region I, AT has the same decay shape as 
the kernel functions (Gaussians) begin to overlap signif¬ 
icantly, making it possible for the regression to function 
properly and fit the data. For large values of the regu¬ 
larization strength A, the model over-regularizes, yielding 
high errors for any value of tr. As A decreases, the weights 
are given more flexibility to conform to the shape of KE 
functional. Using the same definition for the estimation 
of as in Eq. (10), the median nearest neighbor distance 
over this training set gives as = 0.019. We then have 
logo's = —1.72, which matches the boundary between re¬ 
gions I and II in Fig. 18. In region III, the same plateau 
features emerge for small A. Again, these plateaus occur 
when polynomial forms of the regression model become 
dominant for certain combinations of the parameters a, 
A, and Nt- 

From Eq. (12) and Eq. (13), we showed that the model 
error will tend to the benchmark error while ct —>■ 0 or 
A —?> oo. Similarly to Eq. (8), we can also define the 
benchmark error when T^^ [n] = 0 for this data set as 

^100 

ATo= / dOT^lno]. (49) 

Jo 

Evafuating the above integral numerically on the test set, 
we obtain logATo = 3.7. This matches the error when 
CT —?► 0 in Fig. 17 and Fig. 18. 

We define the a that gives the global minimum of AT 
as cr; similarly to Fig. 12, we plot the optimal model 
error AT{a) as a function of A in Fig. 19. For large A, 
we overregularize the model; the model error tends to the 
benchmark error in Eq. (49). For moderate A, we observe 
the same linear proportionality AT{a) oc A as in Fig. 12. 

In this toy system, the prediction of the KE from KRR 
models shares properties similar to those that we ex¬ 
plored in learning the Id cosine function. Now we will 
consider up to 4 noninteracting spinless fermions under 
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FIG. 19. The dependence of AT{a) on A for various Nt- 
Here, a minimizes Af for fixed Nt and A. Nt values for each 
curve are given in the legend. 

a potential with 9 parameters as reported in Ref. [10]. 

3 

v{x) = - ^ Oj exp [-(a; - bif/{2c^)] . (50) 

These densities are represented on Nq = 500 evenly 
spaced grid points in 0 < a: < 1. Here a model is built 
using Nt/4: pairs of electron densities and exact KEs for 
each particle number N = 1,2, 3,4, respectively. Thus, 
the size of the training set is Nx- 1000 pairs of electron 
densities and exact KEs are generated for each N, so the 
size of the test set is .S' = 4000. Since there are 9 param¬ 
eters in the potential, we cannot define the error as an 
integral, so we use summation instead. Thus, the error 
on the test set is defined as the mean square error (MSE) 
on the test densities 

s 

AT = J2{T^^[n,]-T[n,]r/S. (51) 

1=1 

Fig. 1 shows the error of the model as a function of 
a with various Nx for fixed A = 10“^°. Although this 
system is more complicated than the previous two sys¬ 
tems discussed in this paper, the qualitative behaviors in 
Fig. 1 are similar to Fig. 2 and Fig. 17. Table I in Ref. [10] 
only shows the model error with optimized hyperparam¬ 
eters for Nx = 400. In Fig. 20, model errors varying 
with a wide range of a values are shown for 4 different 
values of A [34]. The qualitative properties in Fig. 20 
are similar to Fig. 3 and Fig. 18. For example, the ex¬ 
istence of three regions with distinctly different behavior 
for the model error can be ascertained just like before. 
In region I, error curves with different A will all tend 
to the same benchmark error limit when cr —>■ 0. The 
median nearest neighbor distance over this training set 
gives CTs = 0.022. In Fig. 20, the boundary between re¬ 
gion I and region II is well estimated by logo's = —1.66. 
In region III, the familiar plateau features emerge. In 
region II, where a is such that the model is optimal or 


FIG. 20. Dependence of the model error (as a function of a) 
for various A. The labels give the value of A for each curve. 
The A = 3.2 x 10~^^ curve is not plotted for logo- > 5.5 due 
to the numerical instability that occurs when A is small for 
large a. 

close to it, we find that the model with hyperparame¬ 
ters a = 1.86, A = 3.2 x 10“^'* performs the best. The 
MSE for this model is 1.43 x 10“^ Hartree. Another com¬ 
mon measure of error is the mean absolute error (MAE), 
which is also used in Ref. [10]. The MAE of this model is 
1.99X10“^ Hartree = 0.12kcal/mol. This result is consis¬ 
tent with the model performance reported in Ref. [10] [35]. 


V. CONCLUSION 

In this work, we have analyzed the properties of KRR 
models with a Gaussian kernel applied to fitting a simple 
Id function. In particular, we have explored regimes of 
distinct qualitative behavior and derived the asymptotic 
behavior in certain limits. Finally, we generalized our 
findings to the problem of learning the KE functional of 
a single particle confined to a box and subject to a well 
potential with variable depth. Considering the vast dif¬ 
ference in nature of the two problems compared in this 
work, a Id cosine function and the KE as a functional of 
the electron density (a very high-dimensional object), the 
similarities of the measures of error Af and AT between 
each other are remarkable. This analysis demonstrates 
that much of the behavior of the model can be rational¬ 
ized by and distilled down to the properties of the kernel. 
Our goal in this work was to deepen our understanding 
of how the performance of KRR depends on the param¬ 
eters qualitatively, in particular in the case that is rele¬ 
vant for MLA in DFT, namely the one of noise-free data 
and high-dimensional inputs, and how one may deter¬ 
mine a-priori which regimes the model lies in. From the 
ML perspective the scenario analyzed in this work was 
an unusual one: small data, virtually no noise, low di¬ 
mensions and high complexity. The effects found are not 
only interesting from the physics perspective, but are also 
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illuminating from a learning theory point of view. How¬ 
ever, in ML practice the extremes that contain plateaus 
or the “comb” region will not be observable, as the prac¬ 
tical data with its noisy manifold structure will confine 
learning in the favorable region II. Future work will fo¬ 
cus on theory and practice in order to improve learning 
techniques for low noise problems. 
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Appendix B: Afp for Nt = 3 


^//S — ^01 ~ + /a) + 

, 2/3 hiiifi — /a) , , 


fi — /a 

P + l 


where 


/3 + 1 


G, 


(HI) 


G = + 2 /i( 4/2 + /a) + 4/| + 8 / 2 /a 

+ 7/1) + 36(2/3 + 1)7')/(36(/3 + 1 )^), (B2) 

and where / = |(/i + /2 + /a)- 


Appendix C: Afe for Nt = 3 


Z\/g — Gi Hqi + C 2 hii + G 3 ft .21 + ho2 + G 4 , (Cl) 


Appendix A: Expansion of Af for a ^ as 


where 

Gi = ((VA(487 - e^{17h + 4/2 + /a)) 

- 2 e{h{e^ - 20) - 8/2 + 4/3) - 4Ae(/2 + 4/3)))/ 
((yA + e)(- 8 (A + 3) + e2 + 8 VAe)), (C2) 


Nt ol 


7 = 1 


Nt 

E< 


G 2 = { 2 e{ 2 VXe{ 9 fi + 2/2 + fs) + 3 fie^ - 24/i 
+ 8 A (/2 + 2 / 3 ) - 4 / 26 ^ + / 3 e 2 + 24 / 3 ))/ 



((V/ -k e)(- 8 (A -k 3) -k £2 -k 8 V/£)), 

(C3) 

(Al) 

4£ (ei-h + 2/2 - /a) - 12 /v^) 



^ - 8 (A-k 3)-k £2-k 8 yA£ 

(C4) 


where the first integral is given in Eq. (9). Next 

J = (A 2 ) 

where 


C,. = ( erf ( 7^^ ] + erf ^ ^ 


tV2 


tV2 


(A3) 

erf is the error function, and G* denotes the complex 
conjugate of G. The last integral is 


dxk(x,Xi)k{x,Xj) = 




X ( erf ( _ erf { ^ 1 1 . (A4) 


2 ct 


2cr 


Ci = {2^e[e\7bfl + 2/1(68/2 - 6/3) + 48/^ 

+ 116/2/a + 7bfl) - + 5/1/2 

+ 2/1/3 + 2/1 + 5/2/3 + 5/1) + 34560/') 

+ 3A(e4 (273/2 + 232/1/2 + 226/1/3 + 48/| 

+ 232/2/3 + 273/1) - 160e2(7/2 + 15/i(/2 
+ 2/3) + 4/1 + 15/2/3 + 7 fl ) + 11520/') 

+ 8A3/2e(e2(40/2 + gi + 400/1/3 + 16/| 

+ 91/2/a + 40/1) - 240/(4/1 + /2 + 4/3)) 

+ 4e®(2/i -|- 2/1/2 — /i/a + 8/| J- 2/2/3 + 2/3) 

- 80e^(5/2 + 10/1/2 - 2/1/3 + 8 fi + 10/2/3 

+ 5/3) + 16A2e2(48/i -I- 16 /i(/2 -I- fs) + 3/| 

+ 16/2/a + 48/1) + 960e2(7/2 + 2/i(4/2 + /a) 

+ 4/1 -I- 8/2/a + 7/3))/ 

(60(7/ -k e)2(-8(A -k 3) -k £2 + 8yAe)2), (C5) 

and where / = ^(/i + /2 + /a)- 
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